Development of a DNA barcode library of plants in the Thai Herbal Pharmacopoeia and Monographs for authentication of herbal products

Traditional herbal medicine has long been practiced as a method of health care in many countries worldwide. The usage of herbal products has been increasing and is expected to continue to do so in the future. However, admixture and adulteration are concerns regarding the quality of herbal medicine, including its safety and efficacy. We aimed to develop a reference DNA barcode library of plants listed in the Thai Herbal Pharmacopoeia (THP) and Monographs of Selected Thai Materia Medica (TMM) (n = 101 plant species) using four core barcode regions, namely, the ITS2, matK, rbcL and trnH-psbA intergenic spacer regions, for authentication of the plant origin of raw materials and herbal products. Checking sequences from samples obtained from local markets and the Thai Food and Drug Administration (Thai FDA) against our digital reference DNA barcode system revealed the authenticity of eighteen out of twenty tested samples as claimed on their labels. Two samples, no. 3 and 13, were not Cyanthillium cinereum (L.) H.Rob. and Pueraria candollei Wall. ex Benth. as claimed, respectively. They were recognized as Emilia sonchifolia (L.) DC. and Butea superba (Roxb.), respectively. Hence, it is important for the Thai FDA or regulatory agencies to immediately initiate strict enforcement for the development of pharmacopoeial standards as well as revisions or modifications of available regulatory guidelines and to implement close monitoring for the quality control of herbal products in terms of authentication before they enter the herbal market. The centralized digital reference DNA barcode database developed here could play a very important role in monitoring or checking the authenticity of medicinal plants.


Results
DNA barcoding of selected plants in the THP and TMM. Genomic DNA was successfully extracted from all 101 plant species belonging to 89 genera and 51 families (Table S1). The core DNA barcode regions, namely, the ITS2, matK, rbcL and trnH-psbA intergenic spacer regions, were amplified. In the PCRs, positive and negative control amplifications gave accurate results. All PCR amplicons were clearly segregated and visible as single bands of the expected size. The partial sequence lengths ranged between 228 and 278 bp (average 258) for ITS2, 424 and 478 bp (average 450) for matK, 540 and 580 bp (average 550) for rbcL and 420 and 458 bp (average 428) for the trnH-psbA intergenic spacer. All nucleotide sequences were submitted to NCBI GenBank, and their accession numbers are listed in Table 1.
Authentication of herbal products. Genomic DNA was successfully isolated from all twenty different dosage forms of herbal products ( Fig. 1; Table S2) and amplified for four barcode regions, namely, the ITS2, matK, rbcL and trnH-psbA intergenic spacer regions. Furthermore, the authenticity of all twenty samples of single-herb formulation products was tested using our reference DNA barcode database and nucleotide Basic Local Alignment Search Tool (BLAST) analysis of available NCBI GenBank sequences (Table S3). The results confirmed the authenticity of eighteen out of the twenty samples tested. The sequences obtained from the other two samples, no. 3 and 13, which were purchased from local markets, did not match the name on their labels ( Table 2). Sample no. 3 was labeled as Cyanthillium cinereum and sample no. 13 was labeled as Pueraria candollei. However, our nucleotide BLAST results showed that sample no. 3 and 13 were Emilia sonchifolia and Butea superba, respectively. All samples provided by the Thai FDA were correct according to their claims. The NCBI GenBank nucleotide blast results of these samples are provided in Table 2.
Maximum likelihood phylogenetic analysis. Maximum likelihood (ML) phylogenetic analysis of all reference plant species was performed using the ITS2, matK, rbcL, and psbA-trnH regions. The unrooted phylogenetic tree of the rbcL region showed clear clades, and each cluster represented a specific group of plant species (Fig. 2). Each color represents a monophyletic clade based on plant genera and families, indicating their close phylogenetic relationships. A large number of plant species clusters belonged to the Asteraceae, Fabaceae, Lamiaceae, Rutaceae, and Zingiberaceae families. The bootstrap values were estimated with 1000 replicates with support values. These findings showed that the rbcL region-based phylogenetic tree can be used as an efficient Development of a centralized reference DNA barcode database. In this study, a centralized digital reference DNA barcode system for regulating herbal products was developed. The reference DNA barcode database incorporates voucher numbers, scientific names, common names, Thai names, plant habitats, collection forms, plant photographs, herbarium images and other information, such as collection dates, collection locations, collectors, and taxonomists, along with geocoordinates (Fig. 3). All DNA barcode marker information, including genes, gene sequences, and GenBank accession numbers, will be included in the database. Using the scientific name or Thai name in the search option, the end user can obtain all the information for a particular plant. An attempt to establish a digital database system is made, and the database is found to be an efficient tool with which to systematically assess traditional medicine and its herbal products and connect it with both national and international herbal trade regulators. This database system is a novel concept in Thai herbal development, and its availability to the industry as well as consumers and researchers will bring a noticeable change in the regulation of herbal trade.

Discussion
The global markets of herbal drugs are large and increasing every year. However, increasing demand leads to adulteration or substitution in the raw materials 10,26,27 . Many reports of adverse reactions may often be due to the consumption of unintended herbs, which has directly affected the marketing or campaign of herbal products 9,10,12,16,27 . Various identification methods, including taxonomic, genomic, and phytochemistry methods, have been used to authenticate herbal products 28  www.nature.com/scientificreports/ In this study, DNA barcodes of 101 highly traded medicinal plants listed in the THP and TMM of Thailand were developed. The highly traded samples of single-herb formulations that are not restricted to closely related plant species obtained from a local market and the Thai FDA were tested for their authenticity. Irrespective of the herbal samples, DNA analysis has been done using our own reference database along with available NCBI nucleotide blast analysis. Due to the inherent limitations of single-locus of DNA barcoding, an emerging DNAbased, or phylogenetic method is needed for the identification of closely related plant species. The utilization of DNA as a source of information for identifying inaccurate plant ingredients on herbal product labels is starting to be explored 9,10,16 . Four core DNA barcode regions, namely, the ITS2, matK, rbcL, and trnH-psbA intergenic spacer regions, were used to develop a reference DNA barcode library for testing the authenticity of twenty single formulation herbal products. Our analysis indicated that all twenty samples tested for their authenticity were correct according to their labels, except samples no. 3 and no. 13, which were from powder and capsules labeled Cyanthillium cinereum and Pueraria candollei, respectively. Nucleotide BLAST results revealed that the Cyanthillium cinereum (sample no. 3) powder was replaced by Emilia sonchifolia, and the Pueraria candollei (sample no. 13) capsules contained instead Butea superba. Similar morphologies and confusion of vernacular names could explain this replacement. Cyanthillium cinereum has high antioxidant activity 29 and is used in Thai medicine to reduce smoking withdrawal symptoms and treat skin ailments, as well as asthma, bronchitis, cough, cancer, malaria, gastrointestinal conditions, diuresis, pain, and diabetes 30,31 . Emilia sonchifolia is used for the treatment of anti-inflammatory stomach tumors, ophthalmia, diarrhea, wounds, intestinal worm infections and bleeding piles 32 . Pueraria candollei is used to relieve menopausal symptoms, including vasomotor symptoms, reproductive symptoms, depression, and musculoskeletal pain, in estrogen-deficient women 33 . Butea superba has been used for rejuvenation, for sexual arousal, and to prevent erectile dysfunction 34 . These results clearly indicate the extent of the problem that might occur due to the use of unauthentic raw drugs in Thai medicine. There were no rbcL reference sequences of sample no. 13 in NCBI GenBank; hence, the Barcode of Life Data System (BOLD) database was used to analyze this sample. Both of these samples were obtained from a local market. Herbal products purchased from local marketplaces could be more likely to obtain adulterations or admixtures, especially powder samples. It is very difficult to differentiate mixed powdered forms. Previously, many reports showed that the powdered form of samples had a greater chance of admixture than other forms, for example, the powdered form of ginger (Zingiber officinale Roscoe) admixed with chili powder (Capsicum annuum L.) 35 and the powdered form of black pepper (Piper nigrum L.) admixed with chili powder (Capsicum annuum L.) 36 . www.nature.com/scientificreports/ For the purpose of this study, an ML phylogenetic tree of our reference plant species was constructed using all four DNA barcode regions. Among the markers, rbcL is highly conserved, and its sequence query revealed the highest identity with plant species or closely related plant species. However, identification by this marker will not be reliable if the taxonomic identity of the nucleotide sequence in the GenBank database is incorrect. These issues can be resolved by using a phylogenetic tree wherein the incorrectly identified samples are highly likely to be located in unexpected clades 37 . Our rbcL region phylogenetic tree showed the arrangement of all the plant species in appropriate clades or plant groups, as would be expected based on phylogenetic relationships among the plant species (Fig. 2). Therefore, taxonomic identification using the rbcL region at the species level is more reliable than other regions tested in this study. These results were consistent with those of previous reports that the rbcL region is a suitable candidate region for plant species identification 38,39 . Previously, the utility of the rbcL region in discriminating land plants was successfully validated 40 . The use of rbcL has increased due to its high discrimination proportions at low taxonomic levels 39 . In this study, the matK and trnH-psbA regions were unable to differentiate the plant species, and the ITS2 region showed similar results, with a few of the plants of the same genus clustered with different groups of plants (Fig. S1). Therefore, this study was restricted to the rbcL region-based ML phylogenetic tree; however, multilocus DNA barcode techniques could be used as advanced tools for the accurate identification of medicinal plants.
Numerous adulteration and substitution studies of herbal products have been reported worldwide, including in Thailand. In addition, the international herbal product supply chain repeatedly lacks botanical expertise to provide suitable documentation for the identification of raw herbal materials 37 . Unfortunately, in Thailand, there is no systematic regulatory mechanism for the quality control of herbal drugs before entering the market. It is very important to use appropriate analytical techniques for herbal products. Through this study, we propose a centralized digital DNA barcode database to aid in the regulatory step of identifying the plants used in herbal products. This reference database incorporates voucher numbers, scientific and common names, Thai names, plant habitats, collection forms, and plant photographs, including herbarium images, and other information such as collection dates, collection locations, and geocoordinates. By using scientific or common names, one can obtain all the information on a particular plant species or herbal product. This database could play a very important role in monitoring or checking medicinal plants or herbal trade and could ensure that all essential information is freely accessible to consumers and regulatory authorities in Thailand. Herbal testing centers and certification facilities will enhance the quality control of herbal products and help regulate the national and international herbal trade. Further, we are planning to extend the test to medicinal and non-medicinal plants available in Thailand. Future research will continue to validate and update the reference DNA barcode library and protocol or procedure for analyzing herbal samples. Furthermore, certification of the ingredients mentioned on herbal product labels using our reference DNA barcode database will continue.

Conclusion
Admixture or adulteration in herbal products is one of the main problems in herbal trade because the identification of herbal ingredients is challenging. Hence, there is an important requirement to develop a reference DNA barcode library or centralized digital database system that could serve as a regulatory database for ensuring the safety and quality of traded herbs. It is very important that the Thai FDA immediately begin to strictly enforce the development of pharmacopeial standards as well as revisions or modifications of existing regulatory guidelines to check or monitor the authenticity of raw materials or herbal products before they enter the herbal market. For quality assessment of herbal products, we strongly recommend incorporating DNA-based methods into the THP and TMM to maintain the safety, quality and efficacy of herbal medicines prior to them entering the market.

Materials and methods
Plant materials and herbal products. Multiple 41 . Seventeen single formulation herbal products from local herbal markets across Thailand and three herbal products from the Thai FDA were analyzed in this study. Herbal sample codes are listed in Table 2. Genomic DNA of different dosage forms of the herbal product was extracted using a DNeasy Plant Mini Kit (Qiagen, Germany) and further purified using a GENECLEAN Kit (MP Biomedicals, France). The DNA isolation of herbal samples required multiple attempts to obtain good PCR amplification against the ITS2, matK, rbcL and psbA-trnH intergenic spacer regions. Subsequently, all those PCR products were sequenced as described above.

DNA isolation and PCR amplification.
DNA sequencing and phylogenetic analysis. The sequences were edited using BioEdit software (version 5.0.6). BLAST analysis was conducted with the sequences as queries to determine the similarity of the nucleotide sequences in NCBI GenBank. The sequences with the maximum query coverage, highest homology, and maximum score were downloaded in FASTA format from the database and included in our analysis. The ML method was used to construct the relationships among plant samples with an appropriate model of nucleotide evolution. The final alignment file was imported into MEGA 7 to determine the character information prior to phylogenetic analysis using the Kimura 2-parameter molecular evolution model with 1,000 rapid bootstrapping replicates 50 .